{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 24 - Understanding the Titanic data\n", "\n", "On April 15, 1912 the \"unsinkable\" ship [Titanic](https://en.wikipedia.org/wiki/RMS_Titanic) sank after hitting an iceberg. The Titanic was on her maiden voyage from Europe (with passengers boarding in Southampton in the UK, Cherbourg in France, and Queenstown in Ireland) to New York.\n", "\n", "In the next two labs, we will learn how to use Python to make predications (\"machine learning\") about whether a passenger survied or perished.\n", "\n", "In this lab, we will get familiar with the data. A description of the data is [here](https://www.kaggle.com/c/titanic/data). The passengers have been split into two sets. We will look at the first set, in [train.csv](http://comet.lehman.cuny.edu/owen/teaching/mat128/train.csv), in this lab. These passengers are the *training data*, since we know whether each passenger survived. Next class, we will make our predictions using the *test data*, which does not include whether those passengers survived." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the training data into a dataframe called `train`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which columns are quantitative (numerical)? Which are qualitative (categorical)?\n", "\n", "We can see a list of the quantitative columns using the following command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What other information is given about the columns?\n", "\n", "To see information about the qualitative columns using the following command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train.describe(include = [\"O\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What other information is given about the qualitative columns?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }